-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor cyberowl core code #32
Conversation
src/main.py
Outdated
except Exception: | ||
raise ValueError("Error in the spiders!") | ||
except Exception as exc: | ||
raise ValueError("Error in the spiders!") from exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain what from exc
do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from exc
here shows the cause of the exception; as if, each time this exception is raised it also shows the exceptions that led to it.
However, I am still trying to find a way to implement it properly, because it currently seems unnecessary.
src/mdtemplate.py
Outdated
|
||
class Template: | ||
""" | ||
This class is used to format the data into a table in markdown format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add attributes description if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, but I think it would be better if we use mdutils
for generating the markdown file, and delete this class. 😅
src/mdtemplate.py
Outdated
source: str | ||
data: list | ||
|
||
def __init__(self, _source, _data): | ||
self.source = _source | ||
self.data = _data | ||
|
||
def _set_heading(self): | ||
return f"""---\n### {self.source} [:arrow_heading_up:](#cyberowl)\n""" | ||
|
||
def _set_table_headers(self): | ||
return """|Title|Description|Date|\n|---|---|---|\n""" | ||
|
||
def _set_table_content(self, title, link, description, date): | ||
return f"""| [{title}]({link}) | {description} | {date} |\n""" | ||
|
||
def fill_table(self) -> str: | ||
""" | ||
Returns a table ready to be written to a file. | ||
""" | ||
table = self._set_heading() | ||
table += self._set_table_headers() | ||
for row in self.data: | ||
table += self._set_table_content( | ||
row["title"], row["link"], row["description"], row["date"] | ||
) | ||
return table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source: str | |
data: list | |
def __init__(self, _source, _data): | |
self.source = _source | |
self.data = _data | |
def _set_heading(self): | |
return f"""---\n### {self.source} [:arrow_heading_up:](#cyberowl)\n""" | |
def _set_table_headers(self): | |
return """|Title|Description|Date|\n|---|---|---|\n""" | |
def _set_table_content(self, title, link, description, date): | |
return f"""| [{title}]({link}) | {description} | {date} |\n""" | |
def fill_table(self) -> str: | |
""" | |
Returns a table ready to be written to a file. | |
""" | |
table = self._set_heading() | |
table += self._set_table_headers() | |
for row in self.data: | |
table += self._set_table_content( | |
row["title"], row["link"], row["description"], row["date"] | |
) | |
return table | |
def __init__(self, _source:str , _data:str): | |
self.source = _source | |
self.data = _data | |
@property | |
def source(self) -> str: | |
"""Returns the source""" | |
return self.source | |
@property | |
def data(self) -> str: | |
"""Returns the data""" | |
return self.data | |
@property | |
def heading(self) -> str: | |
"""Returns the heading""" | |
return f"""---\n### {self.source} [:arrow_heading_up:](#cyberowl)\n""" | |
@property | |
def table_headers(self) -> str: | |
"""Returns the table headers""" | |
return """|Title|Description|Date|\n|---|---|---|\n""" | |
def _set_table_content(self, title, link, description, date) -> str: | |
"""Returns the table headers""" | |
return f"""| [{title}]({link}) | {description} | {date} |\n""" | |
def fill_table(self) -> str: | |
""" | |
Returns a table ready to be written to a file. | |
""" | |
table = self.heading | |
table += self.table_headers | |
for row in self.data: | |
table += self._set_table_content( | |
row["title"], row["link"], row["description"], row["date"] | |
) | |
return table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me!
src/pipelines.py
Outdated
""" | ||
AlertPipeline class | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init ?
src/pipelines.py
Outdated
Remove special characters from text. | ||
""" | ||
return ( | ||
text.replace("\n", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how far this list can grow! but I think u should create list and loop over it to remove whatever you want
e.g.
special_characters = ['\n','\r',' ','|']
return text.translate({ord(charachter): '' for charachter in special_characters})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work because the variable character
should be a character.
src/pipelines.py
Outdated
|
||
def open_spider(self, spider): | ||
""" | ||
Open spider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this function getting spider as parameter and then setting result to empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function open_spider
expects it as an argument, however, we can use *args
and **kwargs
instead.
src/spiders/cisa_spider.py
Outdated
title_selector = "descendant-or-self::h3/span/a/text()" | ||
description_selector = "descendant-or-self::div[contains(@class,'field-content')]/p" | ||
|
||
def parse(self, response): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this function getting repeated? if yes can we take it somewhere where it will be called by all spiders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is, but it is a method of the abstract class Spider
, so I guess we can't take it anywhere. Please let me know if you can suggest an implementation.
What I would do is add another layer of inheritance, abstract all the spiders implemented into one class, and have its arguments be the website URL and the selectors. Let me know what you think..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice changes!! looking forward to seeing this merged 🚀🚀
I have left a few comments
L3zzz!! I'll be adding more changes to review 🙌😅 |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Description
poetry
instead ofvenv
.Usemdutils
for markdown generationCreate Spiders Abstraction class ( Static / Dynamic scrapers )Type of change